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Abstract 

Changes in regulatory DNA contribute to phenotypic differences within and between taxa. Comparative studies show that 
many transcription factor binding sites (TFBS) are conserved between species whereas functional studies reveal that some 
mutations segregating within species alter TFBS function. Consistently, in this analysis of 13 regulatory elements In 
Drosophlla melanogaster populations, single base and insertion/deletion polymorphism are rare in characterized regulatory 
elements. Experimentally defined TFBS are nearly devoid of segregating mutations and, as has been shown before, are quite 
conserved. For Instance 8 of 1 1 Hunchback binding sites in the stripe 3+7 enhancer of even-skipped are conserved between 
D. melanogaster and Drosophila virilis. Oddly, we found a 72 bp deletion that removes one of these binding sites (Hb8), 
segregating within D. melanogaster. Furthermore, a 45 bp deletion polymorphism in the spacer between the stripe 3+7 and 
stripe 2 enhancers, removes another predicted Hunchback site. These two deletions are separated by ~250 bp, sit on 
distinct haplotypes, and segregate at appreciable frequency. The HbSA is at 5 to 35% frequency In the new world, but also 
shows cosmopolitan distribution. There Is depletion of sequence variation on the HbSzl-carrylng haplotype. Quantitative 
genetic tests Indicate that HbSA affects developmental time, but not viability of offspring. The Eve expression pattern differs 
between inbred lines, but the stripe 3 and 7 boundaries seem unaffected by HbSA. The data reveal segregating variation in 
regulatory elements, which may reflect evolutionary turnover of characterized TFBS due to drift or co-evolutlon. 
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Introduction 

Evolution of Transcriptional Regulatory Sequences 

The molecular basis for phenotypic divergence and standing 
variation is often attributed to differences in the regulation of 
transcription [1-3]. The mechanistic principles of regulatory DNA 
and factor structure and function such as; multiple transcription 
factor binding sites (TFBS), TFBS motif degeneracy, cooperativity 
and number of trans factors [3,4] and interactions between 
transcription factors (TFs), enhancers and promoters [5,6] impose 
unique rules on their evolution. Regulatory DNA has no single 
"active-site", since most regions consist of multiple transcription 
factor binding sites. Evolutionary analyses of experimentally 
verified TFBS demonstrate examples of conservation, but also 
reveal evolutionary turnover of TFBS, were some sites are lost and 
others gained [7-9]. 

It has been postulated that selection mainly acts on the 
transcriptional output of a gene (timing, location and amount) 
and does not preserve individual TFBS [10,1 1]. That is, changes 
in TFBS and even losses are permitted, if the transcriptional 
output is preserved. Such models of stabilizing selection acting on 
transcriptional output can account for both loss of functional 
binding sites and evolutionary fine-tuning of regulatory elements 



[12]. They also suggest that positive selection may sometimes play 
a role, acting on compensatory mutations in cis or trans. Several 
studies [13-16] have investigated the evolutionary origin of TFBS, 
including co-evolution within regulatory sequences. From first 
principles one would predict both co-evolution in cis (promoters, 
regulatory modules, more distantiy located signals like insulators) 
and co-evolution of sequence elements with the trans environment 
(abundance of transcription factor, mediator or holenzyme 
components). The model of tans co-evolution is corroborated by 
studies of between-species hybrids [17], which e.g. reveal mis- 
expression of genes in hybrids ofZ). melanogaster a.nd D. simulans, two 
closely related species, most likely due to species-specific cis-trans 
compensatory evolution. Also, genome-wide changes in cis 
elements of co-expressed genes in two distantly related yeast 
species document the co-evolution of the TF repertoire of an 
organism and the regulatory elements of coordinately expressed 
genes [18,19]. Numerical models show how mutation and drift can 
generate binding sites, and predictably that selection can speed up 
fixation of new TFBS [20]. Crucially, functional polymorphism 
(both single nucleotide polymorphism: SNPs or insertion/ deletion 
polymorphism: indels) in human enhancers, are shaped by positive 
selection [21]. 
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Insertion and Deletion Polymorphism in Regulatory DNA 

Population genetics studies liave largely neglected indels, 
perhaps because they represent a minority of segregating variation 
in most genomes [22]. Deletion polymorphism in the intergenic 
region ofAdh in Drosophik pseudoobscura does not conform to neutral 
evolution, but exhibits signatures of purifying selection, i.e. deletion 
(but not insertion) polymorphism was removed from introns over 
time [23]. On a larger scale, Comeron and Kreitman [24] 
revealed a bias in the insc-rtion and deletion frequency distribution 
in D. melanogaster populations. While deletion events were more 
common and on average longer, insertions were at significandy 
higher frequency. This may reflect both mutational bias (because 
the mechanisms causing deletions are difierent from those causing 
insertions) and a difference in selection pressures, with purifying 
selection keeping a large- fraction of deletions at low frequency in 
the population [24,25]. Ometto et al. [25], on the other hand, also 
concluded that weak positive- seie-ction might increase the 
population frequency of some insertions, which is supported by 
a genome-wide study in D. melanogaster [26]. Population genetic 
analyses of Bicoid response genes in D. melanogaster revealed single 
nucleotide polymorphism (SNPs) in 13 of 85 predicted Bicoid 
binding sites [27]. Most notable was the high frequency of SNP 
and indel polymorphism in the Orthodenticle (Otd) early head 
enhancer. These polymorphisms clustered on two haplotypes, 
both at intermediate frequency. Transgene tests showed that the 
Otd haplotypes differ in transcriptional output [27]. Similarly, 
studies of the Endol6 promoter and other sea-urchin enhancers 
[28-30] show that many TFBS are affected by segregating indel 
variation. In particular, in EndolG two rare insertions affect the 
same part of the promoter. One of these generated a functional 
repressor module [29]. 

Enhancers of Eve as a Model of Regulatory Evolution 

Early embryonic development in D. melanogaster is regulated by 
numerous genes through a complex network of activation and 
repression, resulting in segmental boundaries along the embryo 
length [31-35]. The accurate temporal and spatial expression of 
these genes is mainly achieved by integration of multiple TFs and 
their binding to regulatory sequences. Some regulatory functions 
(required for a given expression pattern) are aggregated in distinct 
modules like the eve stripe 2 enhancer (s2e) and the stripe 3 and 
7 enhancer [s3+7e). These experimentally verified "minimal" 
enhancers [36,37] suffice to generate 4—7 cell-wide Eve stripes in 
early development. Not all regulatory sequences contain modular 
enhancers, and often spacer serjuences (separating regulatory 
modules) have function, meaning that the length of these 
sequences matters for proper function of flanking cis-modules 
[10,38]. 

The cumulative effect of nucleotide changes in s2e between 
species is a turnover of functional motifs within enhancers 
[8,1 1,39]. Notably, the s2e from D. erecta is less effective than s2e 
from the more distantly related D. pseudoobscura at complementing 
a deletion of the s2e in D. melanogaster [11]. Quantitative analysis of 
the amount of Eve in stripe 2 iUustrated the functional deficiency 
of the D. erecta s2e in the D. melanogaster genetic background. This 
means that, for a given enhancer, the spatial and temporal features 
of the expression pattern are highly conserved, but the quantity of 
gene product probably less so. The expression level of develop- 
mentaUy-specific gene products may exhibit changes over evolu- 
tionary time, possibly reflecting "developmental system drift" 
[40,41]. 

The aim of the current study was to gauge the level of 
polymorphism in the well-characterized regulatory regions in 
D. melanogaster, with particular focus on insertion and deletion 



polymorphism. Consistent with other studies and evolutionary 
theory, SNP and indel-polymorphism are rare in TFBS. However 
we find two peculiarly large and common deletions in and close to 

the em stripe 3 and 7 enhancer. Both deletions remove binding 
sites for Hunchback, prompting analysis of the genetics and 
phylogeography of one of those polymorphisms and its potential 
phenotypic effects. The data provide insights into the nature of 
variation segregating in cw-regulatory elements. 

Materials and Methods 

Flies and Populations 

Several populations of flies where studied. The population 
genetic surveys were done on collections of inbred lines derived 
from North Carolina, collected in 2000 and 2005 [27,42], and a 
Costa Rican sample from Peter Andolfatto, made isogenic for the 
second chromosome by three generations of crosses. Walter Eanes 
provided DNA from thirteen US East coast populations [43]; a 
total of 380 individuals used to test for clinal variation in the eve 
region. Jean-Claude Walser provided a sample of 46 cosmopolitan 
populations [44], in which DNA from 100 lines in each population 
was pooled. 

PCR, DNA Sequencing and Genotyping 

Primers were designed with primer 3 version 0.3 (frodo.wi.mi- 
t.edu [45]) for 13 well-charactt-rizcd early developmental enhanc- 
ers or promoters and several other non-coding regions (see Table 
SI). The regions studied were several parts of the eve locus (the late 
element, s2e, s3+7e, and the promoter, along with two spacer 
sequences), Kruppel promoter and CDl, salm wing blade enhancer, 
ems abdominal enhancer, en regulator}' region and promoters 
ofAntp, Ubx-bxd, til, Act57B, RpL29/CG30390 and RpL30. The 
sequence variation in those regions was assessed by PCR followed 
directiy by DNA sequencing. PCR was done as before [46] with 
Takara Taq and MJ Tetrad machines on 96 well plates. Products 
where purified by Qiagen purification columns or Exo-sap. DNA 
sequencing was done on purified PCR products, with the forward 
and reverse primer using Applied biosystems reagents. The 
ethanol purified reaction products where run in the University 
of Chicago sequencing facifity or the ABI sequencing machine at 
the Institute of Biology, University of Iceland. 

The deletion of the Hb8 site in s3+7e (see below) and the wild 
type allele were genotyped with PCR using aUele-specific primers 
(Table SI). We ran separate reactions for both alleles on 
individuals from the East coast sample and on bulk DNA samples 
from the cosmopolitan sample. This was used to infer geographic 
distribution of specific variants, but does of course not yield 
information about frequency. All sequences were submitted as 
Popset data to NCBI (accession numbers: KJ465109-KJ465866), 
except two alignments that were shorter than 200 bp (provided in 
fasta format as Supporting information SI and S2). 

Population Genetic Analysis 

Metrics of population genetics (S, n, 9, Haplotype number) were 
calculated for SNPs and indels with Tassel vs. 2.1 (www. 
maizegenetics.net [47]), either for individual regions or as a 
sliding window for the haplotype analysis. Tassel was also used to 
calculate LD, and R (www.r-project.org version 12.3 [48]) for 
testing of contingency tables. DNAsp vs. 4.1 (www.ub.edu/dnasp 
[49]) was also used to test for deviations of Tajima's D and Fu and 
Li's estimators. Furthermore Hudson's haplotype test [50] 
(utilizing the ms program and the psub option) was used to test 
for positive selection in four eve regions. 
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Phylogenetic Shadowing 

A 2 kb region surrounding the stripe 3+7 minimum enhancer 
was blasted against the 12 finished genomes (insects.eugenes.org/ 
species), and the orthologous regions extracted (except D. willistoni 
which did not return a significant blast hit). The DrosophUa 
species (abbreviated) and contig names and locations are listed; 
D. melanogaster (D. met), release 4, D. simulans {D.sim) chromosome 
2R, bases 4491595 to 4494659, D. sechellia [D. seek) scaffold 359, 
bases 7623 to 10695, D. ereda {D. ere) scaffold 4929, bases 8504394 
to 8507885, D.yakuha [D.yak] chromosome 2L, bases 18628840 to 
18632292, D. ananassae (D. ana) scaffold 13266, bases 15371395 to 
15373454, D. pseudoobscura [D. pse) chromosome 3, bases 10879010 
to 10881069, D. persimilis {D. per) scaffold 4, bases 6230662 to 
6232721, D. virilis (D. vir) scaffold 12875, bases 1335449 
to 1337479, D. grimshawi (D. gn] scaffold 15245, bases 9663295 
to 9665324, D. mojaven.sis (D. moj) scaffold 6496, bases 4426987 
to 4430428. The sequences were aligned with MAVID 
(baboon.math.berkeley.edu/mavid [51]). Divergence in these 
sequences is considerable, requiring manual curating in Genedoc 
(www.psc.edu/biomed/genedoc [52]), with special devotion to 
characterized TFBS fi-om redfly.ccr.buffalo.edu [53] and OR- 
egAnno [54]. In addition two additional Hb sites (Hbl5 and 
Hbl6) found by Stanojovic et al. [55] and two Stat binding sites 
discovered by Yan et al. [32] were included. We found that the 
D. melanogaster Stat binding sites differ from the genomic sequence, 
probably due to sequencing error (Stat-1 was reported to start with 
an A and stat-2 was reported as GTTCCCCGAAA, highlighted 
bases differ). 

We also used (jaspar.genereg.net [56]) to predict Hb binding 
sites (score above 6) in the ~8000 bp upstream of eve, in 
D. melanogaster, D. sechellia, D. yakuba and D. pseudoobscura. Based 
on multiple cdignments from Mavid, and Multiz alignments from 
the Santa Cruz genome browser (downloaded in December 2013), 
we mapped predicted Hb binding sites in orthologous and more 
rapidly evolving regions. 

Testing the Effects of a Segregating Deletion on Adult 
Phenotypes 

A set of 20-60 healthy inbred lines from NC [46] were used for 
the two experiments conducted to test the effects of a 72 bp 
deletion within s3+7e (called Hb8A, see below) on viability and 
developmental time. The first was a set of controlled crosses to 
lines deficient for eve, and the second was phenotyping of 60 
genotyped inbred lines. All fly-rearing took place on cornmeal 
food at constant temperature, 25°C. 

We first crossed the inbred lines to four stocks with character- 
ized eve mutations. Ten inbred lines, homozygous for each allele 
(Hb8A or wt) were crossed to each eve mutant. The Bloomington 
stock numbers and genotypes are; BL-4084: eve[5\ I SM6a, BL- 
5344: em [1]/Cy0; P{ry[+t7.2] =ftz/lacC}, BL-1719: Df(2R)X3/ 
CyO, AdhfnBJ and 1702: Df(2R)Xl, MeJ2[Xl] /CyO, AdhfnB]. Three 
virgins of a mutant stock were crossed with 3 males from each of 
the 20 inbred lines, and allowed to lay eggs for 2-3 days. The 
offspring were counted and sexed, between 10 and 11 am, from 
day 10 to 18. The experiment was fully balanced and repeated 
three times, several weeks apart. The parents of all lines used in the 
crosses had been grown for 2 generations uiidc-r controlled density 
(parents discarded between days 2 and 5 depending on visual 
assessment of egg number). We recorded both the total number of 
offspring (viability), and developmental time, summarized as the 
average time to eclosion for a given combination of, mutation, 
cross, genotype, sex and replicate. 

For the association tests, 60 inbred lines where studied. The 
Hb8A/wt polymorphism was genotyped in three individuals of 



each line in the generation that was phenotyped. The rearing and 
measuring procedure was identical to the first experiment, except 
no crosses were required and only replicates were measured (two 
weeks apart). 

Embryo Collections, Fixing and Staining 

The embryos were collected, fixed and stained with standard 
protocols, as we have done before [8]. Four inbred lines with 
(NC25 and NCI 28) and without (NC006, NC017) the HbSA laid 
eggs for 4—5 hrs at 22°C. Briefly, we collected embryos from each 
of the four lines, and they were fixed. Multiple embryo collections 
were pooled before staining with Eve primary antibody and a 
secondary antibody. The histochemical LacZ staining reaction was 
run for 12 minutes. The stained embryos were stored in 70% 
glycerol at 4°C, and photographed within a week. 

Photography and Measurements 

Each embryo in the appropriate developmental stage range was 
photographed three times at 20X magnification with water 
immersion on a Zeiss microscope. First a DIG sagittal section 
yielding maximum length of embryo and then two sections (DIC 
sagittal and bright-field) captured the stripes. Tiff photographs 
were saved and the X and Y coordinates of stripe boundaries 
assessed in ImageJ (rsb.info.nih.gov/ij/ [57]). First, a straight line 
was superimposed on the sagittal image, and the X-Y coordinates 
of anterior and posterior of the embr\'o recorded. Second, the 
same guideline was superimposed on the other two images and X- 
Y landmarks of the anterior and posterior boundary of each stripe 
were visually assessed and recorded. Third, the rotation of the 
embryo along the Dorsal/Ventral axis was scored. FinaUy, the 
stage of development was also visually assessed from eve pair-rule 
expression, in increments of 0.5 on the scale from 1 to 5, around 
ceUularization [11]. The same investigator (AP) did aU measure- 
ments. 

Summarizing the Expression Pattern 

The raw landmark data indicating the length of the embryo and 
placement of stripes were processed in two ways. The relative 
positions of stripe boundaries were estimated by calculating 
distance of landmarks from the anterior and posterior end using 
standard geometric formula. First, the length of the embryos was 
estimated. Second, the relative distance from one embryo tip to 
the anterior and posterior boundary of each stripe was calculated. 

Statistical Analysis of Adult and Embryonic Phenotypes 

SAS version 8.2 [58] was used for analyses of phenotypes. The 
viability and developmental time analyses were conducted with 
mixed model ANOVA (proc MIXED). The model for the test- 
cross was: 

Y=M+C+MXC+G+MXG+CXG+S+0 
+ L{CXG) + error 

Denoting the fixed effects of the mutation (M), that is the 4 
different eve dc'ficic'ncies or point mutations, the cross (C) 
designating the balancer (CyO) or the "loss of function" (LoF) 
eve mutation, the genotype (G) term which evaluates the effects of 
HbSA, sex (S) and appropriate interaction terms. The effects of 
Line (L) and replicate vials (R) are considered random factors. 
Furthermore, the total number of offspring (O) was included as a 
covariate. As a large factorial model with 4 frxed terms runs the 
risk of being overly parametrized, higher order terms were 
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D 

Hbl4a Kni5 Stat2 HblO Hb8 Hbsl 

D.mel TTTTTTGTTT CTGCGCTAGTT TTCCCCGAA TTTTTTAATTC GTTTTTACGA TTTTTTATGA 

D.sim 

D.sec T G 

D.yak C.C A T 

D.ere T c A T 

D.ana A n/a n/a 

D.pse A T.. N/A 

D.per A T. . N/A 

D.vir N/A TT. N/A 

D.gri N/A . . . .c A tt. n/a 

D.moj ^^/^ A TT. N/A 

Figure 1. Two large deletions remove conserved Hunchback binding sites in the eve stripe 3+7 enhancer. A) The structure of the 
upstream region of eve, open boxes represent the late element, s3+7e, s2e and promoter regions, and green boxes the two exons. The deletions are 
shown by blue (Hb8A) and red {Hbsl A) triangles. B) Detailed structure of the Hb8A and frequency of the four alleles at this position in a Costa Rican 
population. C) Structure of Hbsl A and frequency of alleles in the same population. D) The conservation of a subset of TFBS in the s3+7e and the Hbsl 
site. Full species names are provided in IVIaterials and Methods and data for other s3+7e binding sites in Table S3. 
doi:10.1371/journal.pone.0091924.g001 



evaluated and dropped if they were not significant at the 0.05 
level. The association tests of the inbred lines data were simpler, 
with only terms denoting genotype, sex and total number of 
offspring, and not described here. 

The relative location of histochemically detected Eve stripes was 
studied similarly. In order to remove the effects of orientation, a 
reduced model was fit, and the residuals were used in the 
subsequent analysis. The positioning of stripes was analyzed with a 
mbced model ANOVA. The dependent variables of interest are 
the relative positioning of stripe boundaries, with the anterior 
boundary of stripe 3 (S3A) and the posterior position of stripe 7 
(S7P) being particular candidates given prior evidence on 
Hunchback distribution in the embryo [59] . The ANOVA model 
had the general form: 

Y = G+T+GXT + L{G)+eiror 

Where G, indicating genotype (the presence or absence of Hb8), 
is a fixed main effect. The covariate T (for developmental time) 
captures the developmental progression and L is a random term 
for different inbred lines. The relative stripe position matrix 
(anterior/ posterior boundary of all 7 stripes) was also summarized 
with Principle component (Proc PRINCOMP) on the correlation 



matrix. Only the first component, with eigenvalue 7.42, was 
analyzed for dependence on Hb8 genotype. 

Results 

Polymorphism in Regulatory DNA Includes Large 
Deletions of TFBS 

First we surveyed the molecular variation, i.e. nature, frequency 
and distribution of polymorphisms, in 1 3 well studied Dro.wphila 
regulatory elements and several less well defined elements and 
spacer sequences. Few indel polymorphisms are found in the 
regulatory regions, 8 of the regions have no indels (Table 1). 
Purifying selection seems to affect both SNP and indel polymor- 
phism, as there is a significant correlation between 6 for SNPs and 
indels (r=0.48, p = 0.03, Figure SIA). The size and frequency of 
indels in characterized cis-elements was contrasted to those in non- 
coding regions surrounding two developmental genes, hairp and 
EGFR [46,60]. As was previously observed [46] most indels are 
short, and rarely do large indels (more than 10 bp) reach 
appreciable frequency (Figure SIB). The notable exception is a 
72 bp deletion in the stripe 3 and 7 enhancer (s3+7e) of eve 
(Figwe lA and B). Interestingly this deletion removes a DNase I 
characterized Hunchback (Hb) binding site [55], and is henceforth 
called Hb8A. Bioinformatic analyses in Jaspar show that this site 
has a PWM score of 8.5, suggesting the notion that this a 



PLOS ONE I www.plosone.org 



4 



May 2014 | Volume 9 | Issue 5 | e91924 



Turnover of Binding Sites within Species 



o ^ 

I I 



Q 

"n 
E 



U-i vO CTi CO 



u-i 




m 


m 


■=3: 


Ln 


d 




d 



E 

OJ 

o 

OJ 



a. 
Z 

1/) 



OJ 



^ o >- ^ 



S E 

<D -a 



o 
c 

"5 



§ s s 

ro vo t— 



m L/^ ro ro rs VD 
^ 0> 00 ro CO 



O 

o 

Q. 

-a 



-D 



f5 "O 



o 



B 2 E 

V 



£ £ £ E E E 



^ Si ii; ii; 



£ £ 



Q. U U 



ai 



^ o 



n 



(II 

V} 



i H O 



PLOS ONE I www.plosone.org 



5 



May 2014 | Volume 9 | Issue 5 | e91924 



Turnover of Binding Sites within Species 



Table 2. Polymorphism in four regulatory elements of eve among inbred lines from North Carolina. 





Region 


Length 


Sample* 


S 


71 


Dxy 


Haplotypes 


Hd 


Late 


327 


All 


4 


0.0034 


0.0036 


7 


0.8 






wt 


4 


0.0034 




6 


0.748 






HbSA 


3 


0.0028 




5 


0.663 


s3+7e 


262 


All 


6 


0.0103 


0.0131 


7 


0.805 






wt 


6 


0.0098 




7 


0.8 






HbSA 


0 


0 




1 


0 


s2e 


547 


All 


11 


0.0050 


0.0057 


18 


0.859 






wt 


11 


0.0048 




13 


0.862 






HbSA 


8 


0.0021 




6 


0.447 


Pro 


565 


All 


12 


0.0052 


0.0054 


15 


0.864 






wt 


n 


0.0056 




12 


0.863 






HbSA 


7 


0.0020 




5 


0.442 



•Sample size: All (N = 63), wt (N = 43), HbSA (N = 20). 
S: segregating sites. 

Dxy: Average number of nucleotide substitutions per site between wt and HbSA samples. 
Hd: Haplotype diversity. 
Pro: Promoter. 

doi:l 0.1 371 /journal.pone.0091 924.t002 



transcription factor binding site presence/absence polymorphism. 
Oddly enough, less than 250 bp away (in the spacer separating 
s3+7e and s2e), another segregating large deletion also removes a 
putative Hunchback binding site (Figure IC). This site (here called 
Hbsl) is predicted with high PWTVl score, 1 1.2. That is the fourth 
highest score of 60 predicted Hb sites in the 8 kb region upstream 
oi eve in D. melanogaster (Figure S2A and Table S2). Most of the 21 
DNasel characterized Hb sites in s3+7e and s2e have lower scores 
than Hbsl. This 45 bp deletion in the spacer is referred to as 
Hbsl A. This putative Hb binding site has probably been unnoticed 
for two reasons. It sits outside the fragments tested for enhancer 
fiinction, presumably because of restriction site locations [10,37]. 
Also, the D. melanogaster reference genome sequence contains the 
deletion. To iterate, the 45 deleted bases do not appear in the 
standard versions of the D. melanogaster genome and are only visible 
in genomic alignments with close Drosophila relatives or population 
genetic sequence data. The two deletions sit on distinct haplotypes, 
and are never found in the same inbred lines. They are both at 
appreciable frequency, in a sample of 55 Costa Rican chromo- 
somes the HbSA and Hbsl A are at 9% and 17% frequency 
respectively (Figure IB and C). This leads to the question, are 
these deletions harmful, neutral or beneficial? 

Phylogenetic Footprinting of s3-i-7e shows the Hb Sites 
are Conserved 

Comparative genomic alignments of the s3+7e and the adjacent 
regions with 12 pubKcly- available Drosophila genomes [61] were 
used to assess the functional importance of these two predicted Hb 
binding sites, and other characterized Hb, Kni and Stat sites 
[32,55,59]. Similarly to the eve s2e, TFBS in s3+7e are highly 
conserved (Table S3); 3 of 13 Hb sites are identical from D. 
melanogaster to D. mojavensis and 9 have none or only one mutation 
between D. melanogaster and D. persimilis. The Hb8 site is found in 
all of the 1 2 species, except D. ananassae (most probably due to a 
gap in the genomic sequence), but has experienced several 
substitutions (Figure ID). The PWM score for Hb8 is 8.2 in D. 
melanogaster and D. simulans, but 9.9 in D.jakuha and D. pseudoobscura 
(Figure S2 and table S2). On the other hand, the predicted Hbsl 



site (with a PWM score of 11.2) is completely conserved between 
D. melanogaster and D.yakuba, but was not found in distantiy related 
species. Those data suggest considerable evolutionary constraints 
on those sequences, arguing that they could indeed be functional 
Hb binding sites. But in the absence of functional tests they must 
regarded as putative Hb binding sites. 

Additionally, the HbSA also removes half of a putative Sloppy 
Paired 1 (Slpl) binding site. The putative Slpl site is less conserved 
then the characterized Slpl site in s2e [52] (Table S4), but no SNPs 
within either of these two (characterized and putative) Slpl 
binding sites in eve, in 104 sequenced alleles, suggests selective 
constraint within D. melanogaster at least. The genome comparisons 
confirm that both Hb binding sites in eve affected by these two 
deletions have been protected by purifying selection. This prompts 
the question, why do these deletions of conserved TFBS occur at 
such high frequencies in populations? Here we focus mainly on 
studying the population genetics of HbSA and assess its potential 
impact on development and fitness. 

Polymorphism on the Hb8A and wt Haplotypes 

How can a deletion removing a conserved binding site be at 
such high frequency in the population? One possibility is that the 
deletion of Hb8 is buffered by compensatory mutations (sitting 
on the same haplotype). To assess this, and to evaluate tiie 
polymorphism in the region, two strategies were deployed. One 
was deeper sequencing of four eve regions (the promoter, s2e, s37e 
and the late element) in inbred strains from North Carolina, and the 
other, a contrast of sequence diversity in alleles with or without the 
HbSA in ~8 kb around s3+7e. 

The HbSA is at 32% frequency in the NC population (N = 63), 
and there is less variation on the HbSA haplotypes compared to 
the U)t haplotypes (Table 2). For instance n (which captures the 
number of substitutions and their frequency) is 25% to 100% 
lower on the HbSA haplotypes. This is most extreme in the s3+7e, 
and notably weaker in flanking regions. This tendency was 
captured by other population genetic summary statistics (S, 
Haplotypes, haplotype diversity and Dxy - a measure of 
difierences in nucleotide substitution rate between samples). 
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Figure 2. Polymorphism in the ~8200 bp eve region. Visualized are positions 5,860,182-5,868,302 on 2R, with the Hb8A at position 3292 and 
HbsIA at 3602 (blacl< dots). Contrast of polymorphism in the HbSA (black) and wt haplotypes (gray), with n in A) and Q in B), in 800 bp windows, 
sliding 100 bp. C) LD between the HbSA and other variant in the region, estimated with i^. 
doi:1 0.1 371/journal.pone.0091 924.g002 



Furthermore, no unique mutations are found on the evc-HhSA 
haplotypes; the variation observed on the Hb8J haplotypes is all 
presumed to be due to recombination. These observations suggest 
positive selection favors the HbSA or linked variants. However 
none of the standard population genetics tests (Tajima's D or Fu 
and Li's statistics) indicate positive selection (data not shown); 
neither did the Hudson et al. (1993) haplotype test (p>0.73 for 
each of the four regions). 

We next compared more extensively the sequence variation on 
the HbSJ and wt chromosomes and screened for variants that 
might possibly compensate for the loss of this Hb binding site. We 
estimated the polymorphism on two distinct haplotypes carrying 
either the wt or deletion polymorphism, by sequencing 1 6 [HhSA) 
and 18 {wi) chromosomes of each type. The 8200 bp region we 
selected spans the eve neighborhood, from the 3'UTR of CG12134 
to the end of the transcript. There is reduced polymorphism [n and 
0) on the HbSA haplotypes compared to wt haplotypes (Figure 2A 
and B), which is consistent with selection for the HbSA bearing 
haplotype. Another indicator of long haplotypes is high LD 
between HbSA and polymorphic sites in the region (Figure 2C). 
Several sites more than 3 kb away from Hb8 are in high LD {/> 
0.7) with the deletion. Additionally, most polymorphism in the 
region shows perfect coupling or repulsion LD to HbSA (data not 
shown). (The HbslA was only found in 3 (wt) lines. Omission of 
those 3 lines did not afiect the outcome of the polymorphism 



analyses - data not shown). Furthermore, no variants are unique to 
the HbSA haplotype. Finally, no potential compensatory mutations 
that strengthened or generated other Hb sites were observed. The 
data do reveal less diversity on the HbSA haplotype, compared to 
the wt haplotype. Note however, standard tests of natural selection 
can not be deployed on these data because the sampling was not 
random from a population; lines were picked for sequencing to get 
similar representation of wt and HbSA chromosomes. 

Geographic Distribution of the HbSA 

What is the geographic distribution of HbSA and does it 
correlate with geographic attributes? To study the geographic 
distribution, bulk DNA samples from ,5 1 cosmopolitan samples, 
from Europe, Africa, Asia and South America [44] were 
genotyped with allele specific primers. There was evidence of 
HbSA in 43 of the 51 populations (Table S5), consistent with an 
evolutionarily old and broadly distributed polymorphism. The 
cosmopolitan distribution of the HbSA is unlikely if it was strongly 
deleterious. 

Does this binding site deletion show any relationship with 
geographic attributes? To assess this we genotyped HbSA in 13 
east coast samples, from Maine to Florida [43]. The frequency 
ranged from 5% to 3.5% (Table 3) but there was not a significant 
relation between latitude and frequency oi HbSA (/<=— 0.006, 
jd = 0. 1). For comparison the s2e was also sequenced in the same 
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Table 3. Frequency of Hb8A and s2e polymorphisnn along the east coast of North America. 



Populations 




HbSJ 




s2e 










Location, State 


Latitude 


Freq. 


^57- 


N 


Sites 




0 


^57- 


Homestead, FL 


25° 2' 


0.32 




24 


5 


0.0020 


0.0024 




Merrit Island, FL 


28° 3' 


0.16 


0.051 


26 


7 


0.0024 


0.0033 


0 


Jacksonville, FL 


30° 2' 


0.19 


0.000 


29 


7 


0.0022 


0.0032 


0 


Eutawville, SC 


33° 2' 


0.20 


0.000 


26 


5 


0.0021 


0.0024 


0 


Smithfield, NC 


35° 3' 


0.14 


0.000 


16 


4 


0.0024 


0.0022 


0 


Richmond, VA 


37° 3' 


0.05 


0.033 


33 


5 


0.0018 


0.0022 


0 


Churchville, MD 


39° 3' 


0.17 


0.052 


23 


5 


0.0018 


0.0024 


0.051 


Middlefield, CT 


41° 3' 


0.09 


0.017 


37 


5 


0.0019 


0.0022 


0.007 


Concord, MA 


42° 0' 


0.19 


0.030 


41 


5 


0.0014 


0.0021 


0.009 


Whiting, VT 


43° 6' 


0.17 


0.000 


30 


2 


0.0015 


0.0010 


0 


All 




0.17 


0.055(0.04)* 


285 


11 


0.0020 


0.0032 


0.029(0.02)» 



The s2e amplicon was 555 bp. 
Sample size for HbSzf was 380. 

*Average F^j (standard deviation}. None of these pairwise F^t are significant after Bonferroni correction. 
doi:l 0.1 371 /journal.pone.0091 924.t003 



individuals. Again, no unique SNPs are found on tlie Hh8A 
haplotype. Thus, nothing in in tliis broader N-Ainerican sample 
suggests complementary mutations in s2e. Curiously however, 
there is a significant reduction in s2e polymorphism with latitude 
(p = 0.02 for 71 and ff). This does not explain the prevalence of 
HbiSA, but suggests geography (or history) affects variation in the 
regulatory regions of some developmental genes. 

Testing for Effects of Hb8A on Viability and 
Developnnental Time 

Test crosses and analysis of inbred lines were used to gauge the 
putative impact of Hb8A on the number of offspring hatching and 
developmental time. Here developmental time is assessed as the 
time to eclosion (see methods). 

Consistently with earlier studies [63,64] hemizygosity at eve 
reduces viability (Table 4) by about 20% in all crosses except to 
eve^ (DF vs. Cy in Figure S3). However offspring number was not 
affected by the deletion of Hb8 binding site (Genotype term in 
Table 4). Number of hatching offspring differs between the four eve 
mutant stocks (Table 4) most likely due to varying genetic 
backgrounds. We also asked about factors influencing develop- 
mental time. The ANOVA's indicate difference among eve alleles, 
and potential effects of hemizygosity at the locus (Table 4). Most 
notably, Hb8A seems to reduce developmental time (Table 4) - 
while hemizygosity at eve increases it. In three of the four crosses 
did HbSA individuals develop significantly faster than the wt flies 
(Figure 3). The HbSA flies eclose on average 3.5 hours earlier, but 
again no effects are seen in eve . This effect was also seen if the 
effect is estimated for sexes separately. In 1 3 of the 1 6 Mutation- 
Cross-Sex combinations HbSA developed faster than flies with wt 
s3+7e, which is significant in a sign-test (binomial, p = 0.02). Note 
the HbSA is tested in heterozygous form, thus in these crosses it 
appears to have dominant effects on developmental time. 

We also examined the effects oi HbSA with association tests in 
60 inbred lines. As before, HbSA had no effect on offspring 
number. Peculiarly, the data do not confirm the association 
between HbSA and developmental time (lower part of Table 4). 
The estimated developmental time is in the same range for both 
experiments suggesting they are not systematically different. 



Together these data suggest an effect of HbSA on developmental 
time, but further tests are needed to confirm or refute this. 

Histochemical Staining of Eve Expression 

Proximal phenotypes, like protein level at a specific time and 
location in the embryo, might be associated with functional 
variation in regulatory elements. To test this we stained for Eve in 
stage 14A embryos of four inbred lines, two HbSA and two wt. 
Mixed model ANOVA shows that the relative positioning of the 
Eve stripe boundaries differs between the four inbred hues studied 
(Table S6). Both developmental stage and embryo orientation 
affect the anterior and posterior boundaries of stripes. Those 
sources of error were accounted for by i) working with the residuals 
after fitting the embryo orientation and ii) using developmental 
stage as a covariate. The average developmental stage does not 
differ between lines (p = 0.8), suggesting that rate of early develop- 
ment does not contribute to the line differences. 

Hb repression establishes die anterior boundary of stripe 3 and 
posterior boundary of stripe 7 [62] . Thus, a priori, those features 
are most likely to be affected by HbSA. However, the mixed model 
ANOVA does not indicate effects of the HbSA on these stripe 3 
and 7 boundaries (Figure 4). It is possible that this Hb site has 
broader function. The only putative signal in the data was with 
stripe 5; according to least square means stripe 5 is found more 
anteriorly in HbSA. But this is not formally significant after 
Bonferroni correction for all 14 tests. A complementary analysis of 
principle components (PC) of the relative stripe positions does not 
implicate HbSA in stripe positioning. The two largest principle 
components capture variation in (PCI) the central stripes and 
(PC2) the anterior - posterior axis of the embryo. The contribution 
of HbSA to principle component 1 is not formally significant 
(Fi_io = 4.25, p = 0.01). These results do not suggest that HbSA 
affects Eve pattern in the early development. 

Discussion 

Sequence comparisons of close and more distantly related 
species show how TFBS emerge, change and get lost [8,65]. Is this 
turnover of functional sequences due to relaxed purifying selection, 
or does positive selection play a role [66-68]? There is substantial 
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Table 4. ANOVAs testing for the effect of Hb8/i (genotype) on viability and developmental time. 









Viability 








Developmental time 






Term/ 




F/ 




Term/ 


F/ 




Exp" 


Var.Comp 


df 


Est(SE)'' 


P 


Var.Comp 


df Est(SE)'' 


P 


Test Cross 


Mutation 


3,493 


55.15 


9.4E-31 


Mutation 


3,486 5.52 


9.8E-04 




Cross 


1,36 


20.06 


7.3E-05 


Cross 


1,36 0.46 


5.0E-01 




M X C 


3,493 


6.81 


1 .7E-04 


M X C 


3,486 2.64 


4.9E-02 




Genotype 


1,36 


0.09 


0.77 


Genotype 


1,36 12.62 


l.lE-03 




M X G 


3,493 


6.39 


3.0E-04 


M X G 


3,486 1 .46 


0.22 




C X G 


1,36 


0 


1.00 


C X G 


1,36 0.28 


0.60 




Sex 


1,493 


1.04 


0.31 


Sex 


1,486 6.67 


0.01 




^Une(CG) 




10.8(4.1) 


3.9E-03 


Offspring 


1,486 4.53 


0.03 




^enor 




80.9(5.2) 


9.3E-56 




25.5(15.1) 


0.05 












"error 


538.7(34.4) 


1.9E-55 


Inbred lines 


Genotype 


2,53 


0.26 


0.77 


Genotype 


2,53 1.71 


0.19 




Sex 


1,136 


3.93 


0.05 


Sex 


1,132 3.4 


0.07 




G»S 


2,136 


0.26 


0.77 


G«S 


2,132 0.73 


0.48 








109.2(26.4) 


1 .8E-05 


Offspring 


1,132 0.02 


0.90 








27.2(8.6) 


8.0E-04 


^Une(G)^ 


231.6(63.4) 


1.3E-04 




"error 




53.1(6.4) 


8.2E-17 


VRep(Q 


122.4(29) 


1 .2E-05 












" error 


107.1(13.6) 


2.0E-15 



Mutation tests for differences among eve allele stocks. Cross the balancer vs loss-of-function eve allele, and genotype the wt vs. Hb8A. 

^Experiment: a test cross of 20 lines with defined genotype to four eve mutants and genotype tests on the 60 inbred lines. 

''For fixed terms the F-statistic is reported and for the random terms the estimated variance components (e.g. Vune{c Gj) with standard error. 

'^The significance of the variance components was determined by the z-function. The variance component for Developmental time was multiplied by 1000 for 

representation. 

doi:1 0.1 371 /journal.pone.0091 924.t004 



variation in gene expression among individuals and the bulk of 
expression QTLs map in cis [69-7 1] . The exact nature of those cis 
variants is rarely known, but a systematic review by Rockman and 
Wray [72] shows that SNPs, indels and length polymorphism in 
repeats can abolish TF binding and affect expression of 
neighboring genes. 

Hunchback Site Polymorphisms are not Deleterious 

Here we report that two large deletions segregating at moderate 
frequency remove predicted Hunchback binding sites in, and next 
to, the stripe 3 and 7 enhancer of eve. Both sites have high PWM 
scores and are evolutionarily conserved. One of them (Hb8) was 
characterized molecularly [55]. Three observations suggest that 
Hbsl, removed by a 45 bp deletion, is a true Hb binding site. It 
has among the highest PWM score of Hb sites in the eae region. It 
is evolutionarily conserved between D. melanogaster and D. erecta and 
resides less than 250 bp away from the Hb8 site. Stanojevic et al. 
[55] footprinted 4 Hb sites in the spacer between s2e and s3+7e, 
and recent thermodynamic models and quantitative measure- 
ments of TF abundances indicate that the spacer between s2e and 
s3+7e contains functional Hb motifs [73]. However functional 
assays are required to confirm that Hb binds to these two sites 
in vivo and modulates eve expression. 

Our initial hypothesis was that these deletions of Hb binding 
sites are deleterious, as the loss or modulation of a single TFBS can 
have measurable effects [72,74,75]. This is refuted by several facts: 
1) both mutations are at appreciable frequency, 2) individuals 
homozygous for each of those deletions survive as inbred stocks, 3) 
Hb8A has cosmopolitan distribution and 4) HhSA does not seem to 
reduce viability and, if anything, it speeds up developmental time. 



The genetic assays had sufficient statistical power to detect the 
effects of eve hemizygosity on offspring number (consistent with 
reported partial haplo-insufficiency at the locus [63,64]) and less so 
developmental time. Thus we conclude that the HbSA is not 
strongly deleterious. The alternate scenarios are that the two 
deletions are either (nearly) neutral or favored by positive selection. 

The most parsimonious explanation is that HbSA is neutral and 
drifts in the population. This scenario is supported by haplotype 
tests, which do not point to the involvement of positive selection. 
However, the fact that the two deletions destroy binding sites for 
the same TF in the same enhancer is rather puzzling. Thus, it is 
tempting to hypothesize that the two Hb binding site deletions are 
favored by selection. Curiously, no other Hb sites in the s3+7e or 
s2e are affected, no substitutions are seen in more than 100 
sequenced lines. 

Variation in Early Development 

Several studies have documented substantial variation in early 
DrosophUa gene expression, with expression arrays [76], RNA seq 
[77] and in-situs [78]. As the deletions are found in s3+7e, it is most 
probable that they could affect Eve stripes 3 or 7. Hb is abundant 
in the anterior of the embryo, and drops adjacent to the anterior 
boundary oteve stripe 3. Hb is also produced in a narrower domain 
in the posterior, close to the posterior boundary oi eve stripe 7 [62]. 
Hb demarcates the boundaries of those stripes (and stripes 4 and 
6). Thus deletions of Hb sites would be expected to lead to an 
anterior shift of stripe 3 and posterior shift of stripe 7, because this 
regulatory module would be less sensitive to Hb repression (the 
absence of its fuU complement of binding sites). Our analysis of 
Eve expression in four inbred lines does not reveal effects of HbSA 
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Figure 3. Effects of Hb8J on developmental time. Represented are least square mean estimates for combination of eve mutation (alleles and 
deficiency chromosomes), and balancer (Cy) or mutation carrying chromosome (DF). Error bars represent 95% confidence intervals. Developmental 
time w/as estimated as the time to eclosion, see methods. 
doi:l 0.1 371/journal.pone.0091 924.g003 



on Eve stripe placement. Genetic and maternal factors affect the 
placement of expression boundaries; physical or environmental 
attributes like egg size do as well [78-80]. Note, lack of evidence 
does not prove the alternative. These results do not prove that the 
Hb<3A does not affect Eve expression. The ideal test of the 
functionality of Hb8A and HbslA requires transgenic constructs in 
a common genetic background or homologous recombination into 
the eve locus of a particular line. It is unclear how such alterations 
would affect proximal or distal features of development. The 
quantitative tests suggest HhSA acts dominantly, and speeds up 
development by ~ 3 hours. This seems unrealistic as the Eve pair 
rule pattern only takes ~50 minutes to mature [81], thus it is 
impossible that these effects (if real) are due to Hb and eve 
interaction during early dc\ (:lopment. But curiously both eve and 
hh also play a role in the developing neuronal system [82,83] but 
the functional interaction of Hb and eve in those tissues is largely 
unexplored. In the absence of functional or genetic confirmation 
we argue for cautious interpretation of the observed association of 
Hb8A and developmental time in the test-crosses. Finally, it is also 
possible that these deletions affect proximal developmental events, 
but that those effects are a minute or acceptable noise in the 
system. 

Can Co-evolution Explain the High Frequency Hb TFBS 
Deletions? 

Co-evolution can occur via neutral changes (e.g. in the network 
neighborhood [84]) or via positive selection favoring compensa- 
tory changes in the genome. Here two co-evolution models that 
may account for these two Hb binding site deletions in eve are 

entertained. Those are i) civ-changes within eve or, ii) fraav-changes 
in the function or abundance of activators and/ or repressors. 



First, the relatively high frequency of those two deletions could 
reflect co-evolution within eoe. Hunchback acts both as a 
transcriptional activator and repressor during development [85- 
87]. Hb positively influences expression via the eve stripe 2 
enhancer, but is part of two-tier repressor system that demarcates 
the boundaries of stripes 3, 4, 6 and 7 [62]. Stripes 3 and 7 are 
known to be activated by D-stat [32], an ubiquitously available 
activator (other agents may also play a role). The high frequency of 
Hb binding site deletions could be a co-evolutionary response to 
increased activation of stripes 3 and 7 expression, for instance via 
altered Dstat binding. This is unlikely as the two D-stat sites in 
s3+7e have not diverged between D. melanogaster and D. erecta (Table 
S2) and no polymorphism is found in those sites within D. 
melanogaster. Binding sites for other agents activating eve stripes 3 
and 7 may have changed; TFBS that could reside elsewhere in 
regulatory regions around eve. The eve regulatory region is 85-95% 
identical between D. melanogaster and D. simulans. We scanned the 
eve region of both species with Jaspar [56], and found hundreds of 
TFBS differing between the species (data not shown). Nonetheless, 
no changes in Hb or Dstat sites were found. It is also possible that 
miRNA docking sites or other regulatory elements in eve have 
changed, thus leading to selection for higher frequency of those 
two Hb site deletions. 

Alternatively, changes in structure or function of frawj-factors, 
like Hb itself, may have led to the increased frequency of those two 
Hb binding site deletions. It is improbable that a protein change is 
responsible, as the differences between the D. melanogaster and 
D. simulans Hb proteins are all on the D. simulans branch 
(unpublished results, Dagmar Yr Amardottir and Arnar Palsson). 
We find it more plausible that the spatial or temporal amount of 
tenj'-factors has changed, for instance a lower amount of Dstat. 
The most intuitive scenario is, quantitative, temporal or even 
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Figure 4. Testing for effects of HbSJ on Eve stripes. A) IMeasurement of eve stripe positioning. A surface image is used for measurement of 
stripe boundaries. A line was superimposed on the embryo and stripe boundaries visually assessed and recorded as X-Y coordinates (black triangles). 
Coordinates for embryo ends (white triangles) are measured from sagittal slices (not shown). B) Significance (negative log of p for genotype; HbSJ vs. 
wt) along the embryo. Shown are lines corresponding to the -log (p = 0.05) cutoff (dashed line) and the Bonferroni correction for 14 tests - 
log(p = 0.0035) (solid line). 
dor:l 0.1 371 /journal.pone.0091 924.g004 
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spatial changes in Hb expression in the embryo - which may have 
prompted co-evolution in regulatory elements sensitive to quan- 
titative changes in Hb amount in development. The eve s3+7e 
might be such a critical Hb-target element. This is of course 
speculation, but in this scenario, otu' \vould expect that other Hb 
target enhancers, which produce expression overlapping the 
spatial and temporal patterns of eve s3+7e might also have 
experienced altered selection pressure. Thus, other Hb such target 
genes could also exhibit point mutations or deletions of conserved 
and presumably functional Hb binding sites. Note, we are not 
arguing positive selection is necessarily responsible; changes in Hb 
dose could lead to relaxation of selection for a subset of Hb target 
genes, and thus previously detrimental mutations in these genes 
could drift to higher frequency. 



[87,88,90-92] and dosage compensation [77,93]. Developmental 
networks must cope with variation due to chance, the internal and 
external environment, and in the relevant genetic components. 
Studies point to the involvement of positive selection in the gain 
and loss of TFBS in Dwsophila [(i6,94] and co-evolution within 
enhancers [39,95]. Furthermore, non-clocklike evolution of the s2e 
from four Drosophila species [11], indicates co-evolution of TF 
abundance and functional elements in cis-regulatory modules. The 
fact that two large deletions removing TFBS for Hb are found in 
close proximity in a regulatory element, might be an example of 
such co-evolution. However we favor the cautious explanation that 
these high frequency deletions reflect developmental system drift 
[40,41], i.e. permitted deviations in parameters of the Drosophila 
developmental regulatory network. 



Conclusions 

The genetic network governing early Drosophila development has 
been used to discover many of the basic principles of develop- 
mental genetics, regulatory DNA function and regulatory evolu- 
tion [6,10,88,89]. Recent technical and analytical improvements 
have enabled quantitative analyses of enhancer function and logic 



Supporting information 

Figure SI Constraints on SNPs and indels in regulatory 

DNA. A) The relationship between single base and indel 
polymorphism (summarized with 9) in 19 enhancers and 
promoters in D. melanogaster. Many of the characterized enhancers 
have no indels, and sit therefore at Y = 0. B) Size and frequency of 
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indels in characterized regulatory DNA and proximate promoters 
(dark circles) vs. indels in non-coding regions (open circles) around 
two developmental genes {hairy and EGFR). 
(TIFF) 

Figure S2 Comparative genomics of predicted Hb 
binding sites in eve. The strength (height of bar) and location 

of Hb binding sites predicted with JASPAR in the ~8 kb region 
upstream oi eve transcription start site, in four Dro.sophila species, A) 
D. melanogaster, B) D. sechellia, C) D. yakuha and D) D. pseudoohscura. 
The three characterized regulatory elements (the late element, 
stripes 3+7 enhancer and stripe 2 enhancer) are graphed as gray 
boxes in A), and the two predicted Hb sites (Hb8 to the left and 
Hbsl on the right) affected by the deletions in D. melanogaster are 
indicated by black circles. Coordinates are according to a 
manually edited Multiz alignment of 12 Drosophila species. 
(TIFF) 

Figure S3 Effects of Hb8A alleles on viability (above) 
and developmental time (below). Represented are least 
square mean estimates for combination of eve mutation (alleles 
and deficiency chromosomes), balancer (Cy) or mutation carrying 
chromosome (DF) and sex. Developmental time was estimated as 
the time to eclosion. 
(TIFF) 

Table SI Oligonucleotide primers used for PGR amplification, 

DNA sequencing and/ or genot^ping. Chimeric primers were used 
to PCR and sequence the eve locus, with a 5' tag corresponding to 
the M 1 3 universal sequencing primers (lowercase). 

(XLS) 

Table S2 Predicted Hb binding sites in the regulatory region 
upstream of eve, in 5 Drosophila species and the source alignments. 
Sheet one lists the Jaspar predicted Hb sites in D. melanogaster 
(D.met), D. simulans {D.sirri), D. sechellia (D.sec), D. pseudoohscura (D.pse) 
and D. yakuba {D.yaJcj. Coordinates are according to a manually 
edited Multiz alignment of 12 Drosophila species. Hb8 is at 4495 
and Hbsl is at 4871. See materials and methods for details. Sheet 
two contains multiple alignments of the eve region. 
PCLS) 

Table S3 Conservation of binding sites in the eve stripe 3+7 
enhancer. Transcription factor binding site numbering of sites 
follows Stanjovic et al 1989, Small et al 1996 and Yan et al 1996. 
Hb binding site 16 is on the opposite strand. Full species names 
and accession numbers are listed in material and methods. (*) 
indicate bases shared by two overlapping binding sites. (N/A) sites 
not identified in these species. Full species names and accession 
numbers are listed in material and methods. (*) indicate bases 
shared by two overlapping binding sites. (N/A) sites not identified 
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